Preamble

Statistics - Freedman, Pisani and Purves

  • Chapter 1 Controlled experiments
  • Chapter 2 Observational studies
  • Chapter 3 The histogram
  • Chapter 4 The average and the standard deviation
  • Chapter 5 The normal approximation for data
  • Chapter 6 Measurement error
  • Chapter 7 Plotting points and lines
  • Chapter 8 Correlation
  •      1. The scatter diagram
  •      2. The correlation coefficient
  •      3. The SD line
  • Chapter 9 More about correlation

Correlation

The scatter plot

Review

  • Frequency tables
  • Histograms & barplots
  • Summary measures

What if we have two variables?

  • How do X and Y relate to each other?

What if we have two variables?

Quantitative vs Quantitative

\[\begin{array}{c|cc} \hline i & X_i & Y_i \\ \hline 1 & 9.4 & 11.2 \\ 2 & 9.8 & 10.4 \\ 3 & 11.6 & 10.4 \\ 4 & 10.1 & 10.1 \\ 5 & 10.1 & 9.4 \\ 6 & 11.7 & 11.8 \\ 7 & 10.5 & 10.5 \\ 8 & 8.7 & 8.0 \\ 9 & 9.3 & 10.7 \\ 10 & 9.6 & 9.5 \\ \hline \end{array} \]

  • Height and weight for individuals.
  • Testosterone levels and race times for athletes.
  • Sun exposure and number of branches for plants.

What if we have two variables?

Qualitative vs Quantitative

\[\begin{array}{c|cc} \hline i & X_i & Y_i \\ \hline 1 & A & 11.7 \\ 2 & B & 10.5 \\ 3 & A & 8.7 \\ 4 & B & 9.3 \\ 5 & B & 9.6 \\ 6 & A & 11.2 \\ 7 & B & 10.4 \\ 8 & B & 10.4 \\ 9 & B & 10.1 \\ 10 & A & 9.4 \\ \hline \end{array} \]

  • Eye color and vision scores for individuals.
  • Industry and share price volatility for businesses.
  • State and student ATAR scores for universities.

What if we have two variables?

Qualitative vs Qualitative

\[\begin{array}{c|cc} \hline i & X_i & Y_i \\ \hline 1 & A & Z \\ 2 & B & Y \\ 3 & A & Z \\ 4 & B & Y \\ 5 & B & X \\ 6 & A & Z \\ 7 & B & X \\ 8 & B & X \\ 9 & B & X \\ 10 & A & Z \\ \hline \end{array} \]

  • Political preference and suburb for voters.
  • Favourite music genre and shirt color for music fan.
  • Car brand and shampoo choice for customer.

How do X and Y relate to either?

Quantitative vs Quantitative

\[\begin{array}{c|cc} \hline i & X_i & Y_i \\ \hline 1 & 9.4 & 11.2 \\ 2 & 9.8 & 10.4 \\ 3 & 11.6 & 10.4 \\ 4 & 10.1 & 10.1 \\ 5 & 10.1 & 9.4 \\ 6 & 11.7 & 11.8 \\ 7 & 10.5 & 10.5 \\ 8 & 8.7 & 8.0 \\ 9 & 9.3 & 10.7 \\ 10 & 9.6 & 9.5 \\ \hline \end{array} \]

Data example

Drawing

Data example

Drawing

How can we find out if our cereal is "good" to eat?

How can we find out if our cereal is "good" to eat?

Scatter plot

  • What is the relationship between Energy (Kilojoules) and the amount of added sugar?

Scatter plot

Scatter plot

Differences in association

Differences in association

Prediction

Prediction

Prediction

Exploring more variables

Summary

We can use scatter plots for visualation of two variables

  • We can use our eyes…
  •       1. to test a hypothesis.
  •       2. to make a prediction.
  •       3. to explore a set of data.
  • Caveats
  •       1. We often see what we want to see.
  •       2. A plot does not quantify a relationship.
  •       3. We can only look at so many plots.

Epilogue

R version

sessionInfo()
## R version 3.2.0 (2015-04-16)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 8 x64 (build 9200)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] pairsD3_0.1.0 ggplot2_2.1.0 pander_0.6.0  xtable_1.8-2  shiny_0.13.2 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.5      knitr_1.13       magrittr_1.5     munsell_0.4.3   
##  [5] colorspace_1.2-6 R6_2.1.2         stringr_1.0.0    plyr_1.8.3      
##  [9] tools_3.2.0      gtable_0.2.0     htmltools_0.3.5  yaml_2.1.13     
## [13] digest_0.6.9     formatR_1.2      htmlwidgets_0.6  evaluate_0.9    
## [17] mime_0.4         rmarkdown_0.9.6  labeling_0.3     stringi_1.0-1   
## [21] scales_0.4.0     jsonlite_0.9.20  httpuv_1.3.3